NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

"A mathematical theory of evolution": phylogenetic models dating back 100 years

https://doi.org/10.1098/rstb.2023.0297

Rosenberg, Noah A; Stadler, Tanja; Steel, Mike (February 2025, Philosophical Transactions of the Royal Society B: Biological Sciences)

Free, publicly-accessible full text available February 13, 2026
Bijections between the multifurcating unlabeled rooted trees and the positive integers

https://doi.org/10.1016/j.aam.2023.102612

Maranca, Alessandra_Rister Portinari; Rosenberg, Noah A (February 2024, Advances in Applied Mathematics)

Colijn and Plazzotta (2018) [1] described a bijective scheme for associating the unlabeled bifurcating rooted trees with the positive integers. In mathematical and biological applications of unlabeled rooted trees, however, nodes of rooted trees are sometimes multifurcating rather than bifurcating. Building on the bijection between the unlabeled bifurcating rooted trees and the positive integers, we describe bijective schemes for associating the unlabeled multifurcating rooted trees with the positive integers. We devise bijections with the positive integers for a set of trees in which each non-leaf node has exactly k child nodes, and for a set of trees in which each non-leaf node has at most k child nodes. The calculations make use of Macaulay's binomial expansion formula. The generalization to multifurcating trees can assist with the use of unlabeled trees for applications in evolutionary biology, such as the measurement of phylogenetic patterns of genetic lineages in pathogens.
more » « less
Full Text Available
Enumeration of Rooted Binary Unlabeled Galled Trees

https://doi.org/10.1007/s11538-024-01270-8

Agranat-Tamir, Lily; Mathur, Shaili; Rosenberg, Noah A. (March 2024, Bulletin of Mathematical Biology)

Abstract Rooted binarygalledtrees generalize rooted binary trees to allow a restricted class of cycles, known asgalls. We build upon the Wedderburn-Etherington enumeration of rooted binary unlabeled trees withnleaves to enumerate rooted binary unlabeled galled trees withnleaves, also enumerating rooted binary unlabeled galled trees withnleaves andggalls,$$0 \leqslant g \leqslant \lfloor \frac{n-1}{2} \rfloor $$ $0 ⩽ g ⩽ ⌊ \frac{n - 1}{2} ⌋$ . The enumerations rely on a recursive decomposition that considers subtrees descended from the nodes of a gall, adopting a restriction on galls that amounts to considering only the rooted binarynormalunlabeled galled trees in our enumeration. We write an implicit expression for the generating function encoding the numbers of trees for alln. We show that the number of rooted binary unlabeled galled trees grows with$$0.0779(4.8230^n)n^{-\frac{3}{2}}$$ $0.0779 (4 . 8230^{n}) n^{- \frac{3}{2}}$ , exceeding the growth$$0.3188(2.4833^n)n^{-\frac{3}{2}}$$ $0.3188 (2 . 4833^{n}) n^{- \frac{3}{2}}$ of the number of rooted binary unlabeled trees without galls. However, the growth of the number of galled trees with only one gall has the same exponential order 2.4833 as the number with no galls, exceeding it only in the subexponential term,$$0.3910n^{\frac{1}{2}}$$ $0.3910 n^{\frac{1}{2}}$ compared to$$0.3188n^{-\frac{3}{2}}$$ $0.3188 n^{- \frac{3}{2}}$ . For a fixed number of leavesn, the number of gallsgthat produces the largest number of rooted binary unlabeled galled trees lies intermediate between the minimum of$$g=0$$ $g = 0$ and the maximum of$$g=\lfloor \frac{n-1}{2} \rfloor $$ $g = ⌊ \frac{n - 1}{2} ⌋$ . We discuss implications in mathematical phylogenetics.
more » « less
All galls are divided into three or more parts: recursive enumeration of labeled histories for galled trees

https://doi.org/10.1186/s13015-023-00224-4

Mathur, Shaili; Rosenberg, Noah A. (December 2023, Algorithms for Molecular Biology)

Abstract Objective In mathematical phylogenetics, a labeled rooted binary tree topology can possess any of a number of labeled histories, each of which represents a possible temporal ordering of its coalescences. Labeled histories appear frequently in calculations that describe the combinatorics of phylogenetic trees. Here, we generalize the concept of labeled histories from rooted phylogenetic trees to rooted phylogenetic networks, specifically for the class of rooted phylogenetic networks known as rooted galled trees . Results Extending a recursive algorithm for enumerating the labeled histories of a labeled tree topology, we present a method to enumerate the labeled histories associated with a labeled rooted galled tree. The method relies on a recursive decomposition by which each gall in a galled tree possesses three or more descendant subtrees. We exhaustively provide the numbers of labeled histories for all small galled trees, finding that each gall reduces the number of labeled histories relative to a specified galled tree that does not contain it. Conclusion The results expand the set of structures for which labeled histories can be enumerated, extending a well-known calculation for phylogenetic trees to a class of phylogenetic networks.
more » « less
Full Text Available
Asymptotic Enumeration of Rooted Binary Unlabeled Galled Trees with a Fixed Number of Galls

https://doi.org/10.4230/LIPIcs.AofA.2024.27

Agranat-Tamir, Lily; Fuchs, Michael; Gittenberger, Bernhard; Rosenberg, Noah A (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Mailler, Cécile; Wild, Sebastian (Ed.)
Galled trees appear in problems concerning admixture, horizontal gene transfer, hybridization, and recombination. Building on a recursive enumerative construction, we study the asymptotic behavior of the number of rooted binary unlabeled (normal) galled trees as the number of leaves n increases, maintaining a fixed number of galls g. We find that the exponential growth with n of the number of rooted binary unlabeled normal galled trees with g galls has the same value irrespective of the value of g ≥ 0. The subexponential growth, however, depends on g; it follows c_g n^{2g-3/2}, where c_g is a constant dependent on g. Although for each g, the exponential growth is approximately 2.4833ⁿ, summing across all g, the exponential growth is instead approximated by the much larger 4.8230ⁿ.
more » « less
Full Text Available
Cultural transmission of move choice in chess

https://doi.org/10.1098/rspb.2023.1634

Lappo, Egor; Rosenberg, Noah A; Feldman, Marcus W (November 2023, Proceedings of the Royal Society B: Biological Sciences)

The study of cultural evolution benefits from detailed analysis of cultural transmission in specific human domains. Chess provides a platform for understanding the transmission of knowledge due to its active community of players, precise behaviours and long-term records of high-quality data. In this paper, we perform an analysis of chess in the context of cultural evolution, describing multiple cultural factors that affect move choice. We then build a population-level statistical model of move choice in chess, based on the Dirichlet-multinomial likelihood, to analyse cultural transmission over decades of recorded games played by leading players. For moves made in specific positions, we evaluate the relative effects of frequency-dependent bias, success bias and prestige bias on the dynamics of move frequencies. We observe that negative frequency-dependent bias plays a role in the dynamics of certain moves, and that other moves are compatible with transmission under prestige bias or success bias. These apparent biases may reflect recent changes, namely the introduction of computer chess engines and online tournament broadcasts. Our analysis of chess provides insights into broader questions concerning how social learning biases affect cultural evolution.
more » « less
Full Text Available
A Mathematical Connection Between Single-Elimination Sports Tournaments and Evolutionary Trees

https://doi.org/10.1080/0025570X.2023.2266389

King, Matthew C; Rosenberg, Noah A (October 2023, Mathematics Magazine)

How many ways are there to arrange the sequence of games in a single-elimination sports tournament? We consider the connection between this enumeration problem and the enumeration of “labeled histories,” or sequences of asynchronous branching events, in mathematical phylogenetics. The possibility of playing multiple games simultaneously in different arenas suggests an extension of the enumeration of labeled histories to scenarios in which multiple branching events occur simultaneously. We provide a recursive result enumerating game sequences and labeled histories in which simultaneity is allowed. For a March Madness basketball tournament of 68 labeled teams, the number of possible sequences of games is ∼1.91×10^78 if arbitrarily many arenas are available, but only ∼3.60×10^68 if all games must be played sequentially in the same arena.
more » « less
Full Text Available
A Dirichlet Model of Alignment Cost in Mixed-Membership Unsupervised Clustering

https://doi.org/10.1080/10618600.2022.2127739

Liu, Xiran; Kopelman, Naama M; Rosenberg, Noah A (July 2023, Journal of Computational and Graphical Statistics)

Mixed-membership unsupervised clustering is widely used to extract informative patterns from data in many application areas. For a shared dataset, the stochasticity and unsupervised nature of clustering algorithms can cause difficulties in comparing clustering results produced by different algorithms, or even multiple runs of the same algorithm, as outcomes can differ owing to permutation of the cluster labels or genuine differences in clustering results. Here, with a focus on inference of individual genetic ancestry in population-genetic studies, we study the cost of misalignment of mixed-membership unsupervised clustering replicates under a theoretical model of cluster memberships. Using Dirichlet distributions to model membership coefficient vectors, we provide theoretical results quantifying the alignment cost as a function of the Dirichlet parameters and the Hamming permutation difference between replicates. For fixed Dirichlet parameters, the alignment cost is seen to increase with the Hamming distance between permutations. Datasets with low variance across individuals of membership coefficients for specific clusters generally produce high misalignment costs—so that a single optimal permutation has far lower cost than suboptimal permutations. Higher variability in data, as represented by greater variance of membership coefficients, generally results in alignment costs that are similar between the optimal permutation and suboptimal permutations. We demonstrate the application of the theoretical results to data simulated under the Dirichlet model, as well as to membership estimates from inference of human-genetic ancestry. The results can contribute to improving cluster alignment algorithms that seek to find optimal permutations of replicates. Supplementary materials for this article are available online.
more » « less
Full Text Available
When is the allele-sharing dissimilarity between two populations exceeded by the allele-sharing dissimilarity of a population with itself?

https://doi.org/10.1515/sagmb-2023-0004

Liu, Xiran; Ahsan, Zarif; Martheswaran, Tarun K; Rosenberg, Noah A (January 2023, Statistical Applications in Genetics and Molecular Biology)

Abstract Allele-sharing statistics for a genetic locus measure the dissimilarity between two populations as a mean of the dissimilarity between random pairs of individuals, one from each population. Owing to within-population variation in genotype, allele-sharing dissimilarities can have the property that they have a nonzero value when computed between a population and itself. We consider the mathematical properties of allele-sharing dissimilarities in a pair of populations, treating the allele frequencies in the two populations parametrically. Examining two formulations of allele-sharing dissimilarity, we obtain the distributions of within-population and between-population dissimilarities for pairs of individuals. We then mathematically explore the scenarios in which, for certain allele-frequency distributions, the within-population dissimilarity – the mean dissimilarity between randomly chosen members of a population – can exceed the dissimilarity between two populations. Such scenarios assist in explaining observations in population-genetic data that members of a population can be empirically more genetically dissimilar from each other on average than they are from members of another population. For a population pair, however, the mathematical analysis finds that at least one of the two populations always possesses smaller within-population dissimilarity than the value of the between-population dissimilarity. We illustrate the mathematical results with an application to human population-genetic data.
more » « less
Full Text Available
Mathematical constraints on F _ST : multiallelic markers in arbitrarily many populations

https://doi.org/10.1098/rstb.2020.0414

Alcala, Nicolas; Rosenberg, Noah A. (June 2022, Philosophical Transactions of the Royal Society B: Biological Sciences)

Interpretations of values of the F ST measure of genetic differentiation rely on an understanding of its mathematical constraints. Previously, it has been shown that F ST values computed from a biallelic locus in a set of multiple populations and F ST values computed from a multiallelic locus in a pair of populations are mathematically constrained as a function of the frequency of the allele that is most frequent across populations. We generalize from these cases to report here the mathematical constraint on F ST given the frequency M of the most frequent allele at a multiallelic locus in a set of multiple populations. Using coalescent simulations of an island model of migration with an infinitely-many-alleles mutation model, we argue that the joint distribution of F ST and M helps in disentangling the separate influences of mutation and migration on F ST . Finally, we show that our results explain a puzzling pattern of microsatellite differentiation: the lower F ST in an interspecific comparison between humans and chimpanzees than in the comparison of chimpanzee populations. We discuss the implications of our results for the use of F ST . This article is part of the theme issue ‘Celebrating 50 years since Lewontin's apportionment of human diversity’.
more » « less
Full Text Available

« Prev Next »

Search for: All records